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ABSTRACT 

Dissertations are the cumulative, tangible "best 
evidence" of interests of doctoral faculty and students in serious 
and incisive scholarship. Ihus, dissertations are thoroughly studied 
by the program review teams periodically hired by boards of higher 
education in most states. The present paper explores seven errors in 
quantitative analysis in both published literature and dissertations. 
They are associated with beliefs concerning statistical significance 
testing, multivariate statistics, chi-square analysis, covariance 
statistical corrections, stepwise analytic methods, the psychometric 
integrity of instrumentation, and presentation of statistically 
nonsensical or impossible results. The errors are described using 
references to works by other authors and small hypothetical data sets 
to illustrate problems. Concrete examples of the errors as they occur 
in dissertations are cited to meike clear that the errors are not 
hypothetical. Ten dissertations, completed since Januciry of 1985, cire 
cited as examples, although pseudonyms are used. These common 
methodological errors in dissertations may reflect poor training in 
research. A 76-item list of references, 14 data tables, and three 
figures are provided. A bibliography of the 89 dissertations from 
which examples are taken are appended. (TJH) 
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ABSTRACT 

Dissertations are the cumulative, tangible "best evidence" 
of interests of doctoral faculty and students in serious and 
incisive scholarship. Thus, dissertations are thoroughly studied 
by th^ program review teams periodically hired by boards of 
higher education in most states. The present paper explores seven 
errors in quantitative analysis in both published literature and 
dissertations. The errors are explained in detail using 
references to works by other authors and small hypothetical data 
sets to illustrate problems. Concrete examples of the errors as 
they occur in dissertations are cited to make clear that the 
errors are not hypothetical. Ten dissertations, completed since 
January of 1985, are cited as examples, although pseudonyms are 
employed to avoid embarrasment of the students or their doctoral 
advisors. The discussion should also be useful to authors of 
published research who may wish to avoid the same errors. 
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Probably the most fundamental challenge confronting doctoral 
programs is maintaining dissertation quality while simultaneously 
respecting student prerogatives. Most accreditation regulations 
require that students be afforded substantial influence over the 
selection of advisors and dissertation committees. Although the 
the tension between expectations for quality and respect for 
student freedom creates a difficult dilemma for doctoral faculty^ 
the dilemma is not one that faculty can afford to ignore. As 
Thompson (1987a^ p. 1) notes ^ 

Even if a faculty member's own inherent interest 
in schc ^rship is not sufficient to warrant 
interest ^n the quality of dissertations being 
produced under the faculty member's own direction 
or under the direction of colleagues^ interest in 
the survival of the program may itself ^ warrant 
concern . 

External review teams that make recommendations to state boards 
regarding program continuation or termination do tend to pay 
disproportionate attention to dissertation quality when making 
their recommendations . 

Review teams quite reasonably feel that dissertations are 
the cumulative^ tangible "best evidence" of faculty and student 
interest in serious and incisive scholarship. For example^ a 
recent review of all education doctoral in Louisiana yielded 
conclusions that the programs at the University of New Orleans 
"produce the best dissertations" (Brown^ Cooper^ Griffiths^ Howey 
& Lilly^ 1985^ p. 80) in the state. But the reviewers 
nevertheless offered the following observations regarding how 



good the best dissertations in the state are: 

Dissertations [at UNO] are weak» With all of the 
improvements noted in the College^ it is 
paradoxical that the dissertations remain so weak. 
(It is acknowledged that a few dissertations 
are excellent^ but the majority are weak.) ...The 
problems with dissertations can be traced to poor 
training in research and inept supervision. (Brown 
et al.^ 1985^ pp. 38-39) 
The purpose of the present paper is not to propose 
mechanisms for improving dissertation quality; such proposals are 
available elsewhere (cf. Thompson^ 1987a). Rather^ the purpose of 
the paper is to identify common methodological errors in 
dissertations that may reflect "poor training in research." Seven 
errors are each explained. Examples of dissertations illustrating 
the errors are cited so as to moot any argument that the errors 
are purely hypothetical. 

A Preliminary Caveat; The Role of Statistical Method in Inquiry 

However^ one preliminary caveat is in order — methodological 
integrity is not the ultimate sina qua non of research^ published 
or otherwise. Certainly it is true that^ "Although the quality of 
educational research is improving^ evidence still indicates that 
much of the research published has important weacnesses" (Borg^ 
1983^ p. 193). Empirical studies of methodologi::al practice in 
published research confirm these general impressions (Persell^ 
1976; Wandt^ 1965; Ward^ Hall & Schramm^ 197^3). Some of the 
problems in the quality of the research literature can be 
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attributed to the journal review process^ studied in an 
intriguing fashion by Peters and Ceci (1982). Nevertheless^ as 
Glass (1979^ p. 12) suggests^ "Our research literature in 
education is not of the highest quality^ but I suspect that it is 
good enough on most topics." 

Even studies with methodological weaknesses can make 
noteworthy contributions to understanding of educational 
phenomena^ i .e. ^ to theory . Reasonable people can disagree about 
the role of theory in research. For example^ Scriven (1980^ p. 
18) argues that^ "In the practical sciences we are looking for 
solutions to problems^ not just explanations of the failures that 
led to the problems... It does not take a theory." 

But theory building is the ultimate objective of good 
science. As Kerlinger (1977^ pp. ^5-6) notes^ "Science^ then^ 
really has no other purpose than theory^ or understanding and 
explanation." As Gergen (1969^ p. 13) explains^ theoretically 
oriented research "not only satisfies our curiosity^ but also has 
the advantage of maximum heuristic value. It leads to new 
investigations and suggests interesting links to other areas of 
concern." As Thompson (in press) argues^ 

...when Jenner discovered many years ago that 
milkmaids did not get smallpox if they had been 
exposed to cowpox^ he had the basis for suggesting 
a possible cure for smallpox. But absent any 
understanding of the mechanics of the cure^, if he 
had then attempted to identify a cure for polio^ 
his original discovery would, have been of no 
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assistance at all. 

1. Dissertations should reflect the limited contribution that 
statistical significance testing can be make to the 
interpretation of results . 

Few methodological offerings have sparked more controversy 

than Sir Ronald Fisher's promulgation of significance testing 

methods, methods that apparently were developed prior to Fisher's 

work (Carlson, 1975). The past 30 years have involved periodic 

efforts "to exorcise the null hypothesis" (Cronbach, 1975, p. 

12A) . Morrison and Henkel (1970) and Carver (1978) provide 

historically important and incisive explanations of the limits of 

significance testing as an aid to interpretation. More recent 

informative treatments are available from Dar (1987), Huberty 

(1987), Kupfersmid (1988), and Thompson (1987b, 1988c). 

Most researchers have been taught the statistical 

significance of results does not inform the researcher regarding 

the importance of outcomes. Shaver (1985, p. 58) makes this point 

in a concrete fashion in his contrived dialogue about 

significance testing: 

Chris: [Looking puzzled.] Well, as I said, it [my 

result] was statistically significant. You 

know, that means it wasn't likely to be just 

a chance occurrence... An unlikely 

occurrence like that surely must be 

important . 

Jean: Wait a minute, Chris. Remember the other day 
when you went into the office to call home? 
Just as you completed dialing the number, 

o . * ^ 

ERIC 7 



your little boy picked up the ph"3rie to call 
someone. So you were connected and talking 
to one another without the phone ever 
ringing... Well, that must have been a 
truly important occurrence then? 
Yet, in three ways actual behavior tends to belie a failure 
to really accept that significance testing does not inform 
decisions regarding the importance of results. First, journal 
editorial boards tend to perceive articles that report 
significant results more favorably than articles not reporting 
significant results (Atkinson, Furlong & Wampold, 1982). Second, 
readers of research findings tend to perceive more 'favorably 
those articles reporting statistically significant results 
(Cohen, 1979). Third, and most disturbing of all, authors tend 
not to submit manuscripts in which nonsignificant results must be 
reported, and even tend to abandon lines of inquiry on the basis 
of such results (Greenwald, 1975). These behaviors are too 
readily transmitted to doctoral students. 

Too few researchers appreciate which study features 
contribute to statistical significance. Although significance is 
a function of at least seven interrelated features of a study 
(Schneider & Darcy, 1984), sample size is the primary influence 
on significance. Some example results may clarify the ways in 
which sample sizes affect significance tests. 

Tables 1 and 2 present significance tests associated with 
varying sample sizes and either moderate (9.8%) or larger (33.6%) 
fixed effect sizes, respectively* The tables can be viewed as 
presenting results for either a multiple regression analysis 
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involving two predictor variables (in which case the "r sq" 
effect size would be called the squared multiple correlation 
coefficient, R) or an analysis of variance involving an omnibus 
test of differences in three means in a one-way design (in which 
case the "r sq" effect size would be called the correlation ratio 
or eta squared ) • 

INSERT TABLES 1 AND 2 HERE- 

Each table presents results for fixed effect sizes but 
increasing sample sizes (4, 13, 23, 33, 43, 53, 63, or 123)- For 
the fixed effect size of 9.8% involved in Table 1, the fixed 
effect size becomes statistically significant when there are 
somewhere between 53 and 63 subjects in the analysis. For the 
33.6% effect size reported in Table 2, the result becomes 
statistically significant when there are somewhere between 13 and 
23 subjects in the analysis. 

For a fixed effect size, adding subjects to the analysis 
impacts statistical significance in two ways. First, as 
illustrated in Tables 1 and 2, the critical F at a fixed alpha 
gets smaller as degrees of freedom error increase. Second, as the 
degrees of freedom error increase, the mean square error gets 
smaller, and thus the calculated P gets larger. 

The researcher who does not genuinely understand statistical 
significance would differentially interpret the effect size of 
9.8% when there were 53 versus 63 subjects, and would 
differentially interpret the fixed effect size of 33*6% when 
•chere were 13 versus 23 subjects in the analysis. Yet the effect 
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sizes within each table are fixed. Empirical studies of research 
practice indicate that superficial understanding of significance 
testing has actually led to serious distortions, such as 
researchers interpreting significant results involving small 
effect sizes while ignoring nonsignificant results involving 
large effect sizes (Craig, Eison & Metze, 1976)! 

Nor does s igni f icance testing typically inform the 
researcher regarding the likelihood that results will be 
replicated in future research (Carver, 1978). Researchers who 
wish to estimate the likely replicability of results should 
instead employ cross-validation logic (Campo, 1988), *:he 
"jackknife" logic developed by Tukey and his colleagues (Crask & 
Perreault, 1977), or the "bootstrap" logic developed by Efron and 
his associates (Diaconis & Efron, 1983). 

Two aspects of significance testing interpretation in 

dissertations warrant attention. First, some students use 

language implying that they are interpreting significance tests 

as if they were effect sizes. But, as Kerlinger (1986, p. 214) 

emphasizes, "Tests of statistical significance like t. and P 

unfortunately do not indicate the magnitude or strength of 

relations." Yet Kerlinger (1986) himself constantly refers to 

results being "highly significant" (cf. pp. 187, 248, 334), and 

other respected textbook authors do so as well (e.g.. Cliff, 

1 

1987, p. 394). No wonder doctoral students such as Darlingtonl^ 

i 

In order to minimize embarrassment to students and the 
members of their dissertation committees, pseudonyms designated 
with pound signs ("#") have been substituted for student names, 
and 2001 is cited as the date on which these dissertations were 
completed. 
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(2001, p» 80) find themselves reporting that "The results of this 
MANOVA was Isic verb agreement] highly significants" 

A second problem in language, implying the interpretation of 
significance tests as effect sizes, involves the use of phrases 
such as "the results approached statistical significance." Robert 
Brown, former editor of the Journal of College Student Personnel s 
made the humorous but telling comment at a recent conference: 
"How do these authors know their results weren't trying to avoid 
statistical significance?" Yet dissertation students such as 
Spearmanft (2001, p. 75) may find themselves reporting that, "The 
number of years of experience was not significant but did 
approach significance." 

The most serious misinterpretations of significance testing 
tend to occur when sample size is small and effect sizes are 
large but are under interpreted, or when sample sizes are 
commendably large and are statistically significant but effect 
sizes are modest and are are over interpreted . Guilfordft (2001) 
provides a thought provoking example of the latter case. 

Guilford^ (2001) administered a measure of self-esteem and a 
measure of achievement motivation to 1,401 subjects, and found 
that scores on the two measures had a statistically significant 
product-moment correlation of 0.449 (p. 83). Thus, the squared r 
effect size in the study was 0.202, or 20.2%. Guilford# (2001, p. 
96) argued that. 

From the data collected in this study, it has been 
established that a relationship exists between 
self-esteem [Self -Esteem Inventory--SEI ] and 
achievement motivation [Resultant Achievement 
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Motivation Scale — RAM] for vocationol-technical 
students ♦ Tho existence of this relationship can 
be both important and useful ir vocational- 
technical education ♦ 
Based on the two measures having only 20 ♦ 2% of their variance in 
common, but the result bei g statistically significant, 
Guilford^ (2001, pp» 97-98) suggests that "the educator can 
choose either the RAM or the SEI and administer it tand get the 
same information] since this study has established the existence 
of a relationship between achievement motivation and self-esteem 
in vocational-technical students"! 

2* Dissertations should reflect the fact that multivariate 
statistics are often vital in. educational research > 

Multivariate statistics have been available to researchers 

for many years, although even tc *ay "there are many articles in 

the research literature in which multiple univariate statistics 

are calculated rather than a single multivariate analysis; for 

instance, one article may report 50 t.-tests rather than one 

M/iNOVA" (Moore, 1983, p» 307). McMillan and Schumacher (1984) 

isolated one reason why some researchers have hesitated to use 

multivariate statistical methods: 

The statistical procedures for analyzing many 

variables at the same time have been available for 

many years, but it has only been since the computer 

that researchers have been able to utilize these 

procedures ♦ There is tb'^s ag in training of 

researchers that has militated against the use of 
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these more sophisticated procedures ♦ There are in 
evidence more each year in journals, however • (p» 
270) 

Hinkle, Wiersma and Jurs (1979) concurred, noting that "it is 
becoming increasingly important for behavioral scientists to 
understand multivariate proc.edures even if they do not use them 
in their own research*" And recent empirical studies of research 
practice do confirm that multivariate methods are employed with 
soine regularity in published behavioral research (Elmore & 
Woehlke, 1988; Gaither & Glorfeld, 1985; Goodwin & Goodwin, 
1985) . 

There are two reasons why multivariate methods are so 
important in behavioral research, as noted by Thompson (1986b) 
and by Pish ( 1988 )* First, mtiltivar iate methods control the 
inf." ^ion of Type X "exper imentwise" error rates . Most 
researchers are familiar with "testwise" alpha* But while 
"testwise" alpha refers to the probability of making a Type I 
error for a given hypothesis test, "exper imentwise" error rate 
refers to the probability of having made a Type I error anywhere 
within the study * When only one hypothesis is tested for a given 
giroup of people in a study, "exper imentwise" error rate will 
exactly equal the "testwise" error rate. 

Bat when more than one hypothesis is tested in a given 
study, the two error rates will not be equal* Witte (1985, p* 
236) explains the two error rates using an intuitively appealing 
example involving a coin toss* If the toss of heads is equated 
with a Type I error, and if a coin is tossed only once, then the 
probability of a head on the one toss and of at least one head 
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within the set of one toss will both equal 50%. But if the coin 
is tossed three times, even though the "testwise** probability of 
a head on each given toss in 50%, the "exper imentwise" 
probability that there will be at least one head in the whole set 
of three flips will be inflated to more than 50%. Researchers 
control "testwise" error rate by picking small values, usually 
0.05, for the "testwise" alpha. "Exper imentwise" error rate, on 
the other hand, can be controlled at the "testwise" level by 
employing multivariate statistics. 

When researchers test several hypotheses in a given study, 
but do not use multivariate statistics, the "exper imentwise" 
error rate will range somewhere between the "testwise" error rate 
and the ceiling calculated in the manner illustrated in Table 3. 
Where the exper imentwise error rate will actually lie will depend 
upon the degree of correlation among the dependent variables in 
the study. Because the exact rate in a practical sense is readily 
estimated only when the Dependent variables are perfectly 
correlated (and "exper imentwise" error will equal the "testwise" 
error) or are perfectly uncorrelated (and "exper imentwise'* error 
will equal the ceiling calculated in the manner illustrated in 
Table 3), it is particularly disturbing that the researcher may 
not even be able to determine the exact "exper imentwise" error 
rate in some studies! 

INSERT TABLE 3 ABOUT HERE. 

Paradoxically, although the use of several univariate tests 
in a single study can le^ad to too many hypotheses being 
spuriously rejected, as reflected in inflation of 
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"exper imentwise" error rate^ it is also possible that the failure 
to employ multivariate methods can lead to a failure to identify 
statistically significant results which actually exist. Fish 
(1988) provides a data set illustrating this equally disturbing 
possibility. The basis for this paradox is beyond the scope of 
the present treatment^ but involves the second major reason why 
multivariate statistics are so important. 

Multivariate methods are often vital in behavioral research 
because multivariate methods best honor the reality to which the 
researcher is ^ purportedly trying to generalize , Since 
significance testing and error rates may not be the most 
important aspect of research practice (Thompson^ 1988c )^ this 
second reason for employing multivariate statistics is actually 
the more important of the two grounds for using these methods. 
Thompson (1986b^ p. 9) notes that the reality about which most 
researchers wish to generalize is usually one "in which the 
researcher cares about multiple outcomes^ in which most outcomes 
have multiple causes^ and in which most carses have multiple 
effects." As Hopkins (1980^ p. 374) has emphasized: 

These multivariate methods allow understanding of 
relationships among several variables not possible 
with univariate analysis.. • Factor analysis^ 
canonical correlation^ and discriminant analysis — 
and modifications of each procedure — allow 
researchers to study complex data^ particularly 
situations with many interrelated variables* Such is 
the case with questions based in the education of 
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human beings* 

Similarly^ McMillan and Schumacher (1984) argue that: 

Social scientists have realized for many years that 
human behavior can be understood only by examining 
many variables at the same time^ not by dealing with 
one variable in one study^ another variable in a 
second study^ and so forth. . . These [univariate! 
procedures haved failed to reflect our current 
emphasis on the multiplicity of factors in human 
behavior... In the reality of complex social 
situations the researcher needs to examine many 
variables simultaneously, (pp. 269-270) 
Unfortunately^ dissertations do not always reflect a 
recognition that multivariate statistics are often vital in 
research. For example^ Cronbachtt (2001^ pp. 78-97) reported 15 
Pearson chi-square tests of contingency table data^ each with 
degrees of freedom (95) that appear to be impossible for the 
data. Similarly^ Spearirantt (2001^ pp. 54-66) reports 10 separate 
ANOVAs^ each involving a factorial analysis^ which maximally 
inflates exper imentwise error rates. 

But Spearman* (2001^ p. 78) was primarily interested in 
interaction hypotheses^ and was forced to report that "All null 
hypotheses failed to be rejected because no statistical 
differences [sic] were found in any of the groups tested for 
interactions." Paradoxically^ different findings might have been 
isolated with the correct use of a multivariate method^ as Fish 
(1988) illustrates^ and perhaps statistically significant 
interactions would have resulted. 
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3. Dissertations should reflect the recognition that t discarding 
variance to conduct chi-square or OVA analyses can lead to 
serious distortions in interpretations, and that even when OVA 
ipethods are appropriate the methods should usually be 
implemented using regression approaches . 

Cohen (1968^ p. 441) has characterized the conversion of 

intervally scaled variables down to the nominal level of scale as 

the "squandering lofl much information." As Kerlinger (1986/ p. 

558) explains^ this squandering can lead to distorted results: 

••.Partitioning a continuous variable into a 

dichotomy or trichotomy throws information away... 

To reduce a set of values with a relatively wide 

range to a dichotomy is to reduce its variance and 

thus its possible correlation with other 

variables. 

Thompson (1988a/ pp. 3-4) notes that 

Variance is the "stuff" of which all quantitative 

research studies are made... It is not usually 

sensible to invest serious effort in collecting 

reliable and valid continuous score data^ and to 

then casually discard the information that we 

previously went to some trouble to collect. 

Dissertation students frequently discard variance in order 

to conduct either Pearson chi-square contingency table tests or 

ANOVA/ ANCOVA/ MANOVA or MANCOVA (hereafter labelled OVA 

methods). Certainly there are many problems with typical 

applications of the chi-square contingency table test (Thompson^ 

198Gb)/ but OVA methods are more frequently applied (Elmore & 

Woehlke/ 1988; Gaither & Glorfeld/ 1985; Goodwin a Goodwin^ 
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1985) y and empirical research indicates that the use o£ OVA 
methods with variables that were originally intervally scaled 
does introduce distortions (Thompson^ 1986a). Thus^ Cliff (1987^ 
p* 130) correctly criticizes the practice of discarding variance 
on interval? y scaled predictor variables to perform OVA analyses: 
Such divisions are not infallible; think of the 
persons near the borders. Some who should be highs 
are actually classified as lows^ and vice versa. 
In addition^ the "barely highs" are classified the 
same as the "very highs^" even though they are 
different. Therefore^ reducing a reliable variable 
to a dichotomy makes the variable more unreliable^ 
not less. 

Furthermore^ even when intervally scaled variables are naturally 
nominally scaled^ regression approaches to OVA analyses still 
tend to be superior to classical OVA calculations (Thompson^ 
1985). 

Most researchers employing OVA methods are aware that "A 
researcher cannot stop his analysis after getting a significant 
F" (Huck^ Cormier & Bounds^ 1974^ p. 68). Gravetter and Wallnau 
(1985^ p. 423) concur that "Reject Ho indicates that at least one 
difference exists among the treatments. With k_ [means] = 3 or 
more^ the problem is to find where the differences are." 

Many researchers employ unplanned (also called a posteriori 
or post hoc) multiple comparison tests (e.g.^ Sheffe^ Tukey^ or 
Duncan) to isolate which means are significantly different within 
OVA ways (also called factors) having more than two levels. 
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Textbook authors tend to discuss unplanned comparisons in 
somewhat prejorative terms. For example, several authors refer to 
the application of these comparisons as "data snooping" (Kirk, 
1968, p. 73, 1984, p. 360; Pedhazur, 1982, p. 305). Keppel (1982, 
p. 150) makes reference to "milking" in his discussion of these 
tests. Similarly, Minium and Clarke (1982, p. 321) note that: 
Prior to running the experiment, the investigator 
in our example had no well-developed rationale for 
focusing on a particular comparison between means. 
His was a "fishing expedition"... Such comparisons 
are known as post hoc comparisons, because 
interest in them is developed "after the fact" — it 
is stimulated by the results obtained, not by any 
, prior rationale. 

Planned (also called a^ priori or focused) comparisons 
provide a valuable alternative to unplanned comparisons. Pedhazur 
(1982, chapter 9) and Loftus and Loftus (1982, chapter 15) 
provide readable explanations of these comparisons . Planned 
comparisons typically involve weighting data by sets of 
"contrasts" such as those presented by Thompson (19 85) or those 
presented in Table 4. Other types of contrasts, those which test 
xur trends in means, are provided by Fisher and Yates (1957, pp. 
90-100) and by Hicks (1973). 



INSERT TABLE 4 ABOUT HERE. 



Contrasts are typically developed to sum to zero, as do all 
five contrasts presented for the data in Table 4. The data 
represent a hypothetical validity study conducted to determine 
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whether various clinical groups score differently on a 
psychological measure* Contrasts are uncorrelated or orthogonal 
(as are the hypotheses they represent or test) when the contrasts 
each sum to zero and when the sum of the cross-products of each 
pair of contrasts all sum to zero also. Thus, the contrasts 
presented in Table 4 are uncorrelated. 

Some researchers do not believe that planned comparisons 
should necessarily be orthogonal. For example, Winer (1971, p. 
175) argues that "whether these comparisons are orthogonal or not 
makes little or no difference." However, orthogonal planned 
comparisons do have special appeal, for statistical reasons 
delineated elsewhere (Kachigan, 1986, p. 309). But as Keppel 
(1982, p. 147) suggests: 

The value of orthogonal comparisons lies in the 
independence of inferences^ which, of course, is a 
desirable quality to achieve*. That is, orthogonal 
comparisons are such that any decision concerning 
the null hypothesis representing one comparison is 
uninfluenced by the decision regarding any other 
orthogonal comparison. The potential difficulty with 
nonorthogonal comparisons, then, is interpreting the 
different outcomes. If we reject the null hypotheses 
for two nonorthogonal comparisons, which comparison 
represents the "true" reason for the observed 
differences? 

There are two reasons why planned comparisons are usually 
superior to unplanned comparison^:;. First, as noted by numerous 
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researchers (Glasnapp & Poggio, 1985, p. 474; Hays, 1981, p. 438; 
Kirk, 1968, p. 95; Minium & Clarke, 1982, p. 322; Pedhazur, 1982, 
pp. 304-305; Sowell & Casey, 1982, p. 119), planned comparisons 
offer more power against Type II errors than do unplanned 
comparisons. for reasons explained elsewhere (Games, 1971a, 
1971b) . For example, for the data presented in Table 4, the 
omnibus test of differences among the six group means is not 
statistically significant (P=1.5, df-'5/6, 2.= . 3155). Furthermore, 
even . if unplanned comparisons were conducted in violation of 
conventional practice (since the omnibus test was not 
statistically significant), statistically significant differences 
would not have been identified either. However, a planned 
comparison involving the mean of the two level-six subjects 
versus the mean of the remaining 10 subjects would have been 
statistically significant (P=12.5, df=l/6, £=.0054). 

However, significance is not the end-all and be-all of 
research (Thompson, 1988c). The more important reason why planned 
comparisons are important is that planned comparisons tend to 
force the researcher to be more thoughtful in conducting 
research, since planned comparisons must be carefully formulated 
before data are collected and since typi^ lly only a limited 
number of planned comparisons can be stated in a given study. As 
Snodgrass, Lsvy-Berger and Haydon (1985, p. 386) suggest, "The 
experimenter who carries out post hoc comparisons often has a 
rather diffuse hypothesis about what the effects of the 
manipulation should be." As Keppel (1982, p. 165) notes. 
Planned comparisons are usually the motivating 
force behind an experiment. These comparisons are 
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targeted from the start of the investigation and 
represent an interest in particular combinations 
of conditions --not in the overall experiment. 
Thus, as Kerlinger (1985, p. 219) suggests, "while post hoc tests 
are important in actual research, especially for exploring one's 
data and for getting leads for future research, the method of 
planned comparisons is perhaps more important scientifically." 

WilksS (2001, p. 116) provides one of the more disturbing 
examples of the use of OVA methods in a dissertation, even though 
planned comparisons were applied in the study. In this study both 
predictor variables, age and math anxiety, could have been 
measured at the interval level of scale. Age was treated as a 
trichotomy. Math anxiety data were actually collected at the 
interval level of scale and were then converted into a dichotomy. 
At least the cutoffs used in creating the dichomotomy (p. 84) 
were not decided with the same arbitrariness employed in the 
initial decision by Wilks# to discard variance on both interval 
predictor variables. 

4. Dissertations should reflect a recognition that covariance 
statistical corrections are usually least helpful (and are 
most dangerous ) when corrections are most needed. 

Many "statistical controls" can be invoked to adjust 

posttest scores when the quantitative researcher believes that or 

random assignment or design selection have failed to create 

groups that were equivalent at the start of the experiment or 

quasi-experiment. These statistical controls are available 

throughout the entire gamut of quantitative methods. For example, 

Gorsuch (1983, pp. 89-90) notes that the first factor extracted 
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in a factor analysis can be located to pass directly through a 
"covariate" variable in factor space* Since factors are 
uncorrelated, the effects of the first factor on all other 
factors will have been statistically controlled. 

Though many of these statistical controls date back to the 
beginning of the century (Nunnally, 1975, p. 9), most of the 
controls have not enjoyed wide use* Analysis of covariance 
(ANCOVA), for example, has been used in about four percent of the 
recently published research (Goodwin & Goodwin, 1985, pp. 8-9; 
Willson, 1980, p. ?)• Ao explained by McGuigan (1983, p. 230): 
Briefly this technique enables you to obtain a 
measure of what you think is a particularly 
relevant extraneous variable that you are not 
controlling ♦ This usually involves some 
characteristics of your participants. For 
instance, if you are conducting a study of the 
effect of certain psychological variables on 
weight, you might use as your measure the weight 
of your participants before you administer your 
experimental treatments. Through analysis of 
cova.riance, you then can "statistiralLy control" 
this variable — that is, you can remove the effect 
of initial weight from your dependent variable 
scores, thus decreasing your error variance. 
One problem with statistical controls is that they assume 
very reliable measurement of the control variables. For example, 
Nunnally (1975, p. 10) notes that reliability will not usually 
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have an appreciable influence on the substantive interpretation 
of most statistical procedures as long as reliability of 
measurement is at least 0.70, but that "Measurement reliability 
becomes crucial. in employing statistical partialling 
operations, as in the analysis of covariance or in the use of 
partial correlational analysis." Cliff (1987, p. 129) concurs, 
nccing that 

In general, partial correlation analysis is 
affected by any lack o€ reliability or validity in 
the variables. In many ways these effects resemble 
•tuberculosis as it occurred a generation or two 
ago: They are widespread, the consequences are 
serious, the symptoms are easily overlooked, and 
most people are unaware of their etiology or 
treatment . 

Unfortunately, too many researchers may not consider and 
certainly do not report the measurement error of their variables. 
As Willson (1980, p. 9) comments, "That reliability of 
instruments is unreported in almost half the published research 
is likewise inexcusable at this late date." 

Statistical control has been particularly appealing to some 
quantitative researchers when random assignment was not 
performed. These researchers expect the statistical adjustments 
of ANCOVA to magically make groups equivalent. 

However, the primary difficulty with statistical control 
performed to make groups equivalent involves the homogeneity or 
regression assumption of the methods. The methods assume that the 
relationship between the covariate and the dependent variable is 
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equivalent in all experimental groups ♦ This assumption is 
necessary because the statistical control procedures are 
implemented by adjusting the dependent variable to the extent 
that the covariate and the dependent variable are correlated when 
group membership information is completely ignored ♦ 

Campbell and Erlebacher (1975) present a concrete 
illustration of how the use of statistical controls can seriously 
distort findings when the homogeneity of regression assumption is 
not met* ANCOVA has been very appealing in research investigating 
the effects of compensatory education programs* In these cases 
the treatment intervention is made available to all or most 
children who are eligible* The control group usually consists of 
children who were not eligible for the treatment and, therefore, 
the group is inherently different in its character than the 
treatment group* In these analyses both the dependent variable 
and the covariate are cognitive variables* The statistical 
control procedure assumes that the relationship between the two 
variables is the same in both groups, i*e*, since co/relation is 
a measure of the slope of the regression line for the two 
variables, that children who are eligible for and receive 
compensatory interventions learn at the same rate as children who 
are not eligible for the intervention* 

The decision to blithely use the statistical control when 
the homogeneity of regression assumption is not met leads to 
"tragically ^misleading analyses" that actually "can mistakenly 
make compensatory education look harmful" (Campbell & Erlebacher, 
1975, p* 597)* Similarly, Cliff (1987, p. 273) argues that, "It 
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could be that the relationship bet'"*en the dependent variable and 
the covariate is different under different treatments* Such 
occurrences tend to invalidate the interpretation of the simple' 
partial correlations described above*" 

Persons who wish to use statistical controls of this type 
are usually trapped in a nasty dilemma. If the controls are not 
neeut;d then they should not be used* But if statistical control 
is needed because the groups in a study are not equivalent, then 
often the homegeneity of regression assumption cannot be met and 
the use results in seriously distorted inferences* 

It is interesting to note that many researchers do not 
recognize the paradox of testing both analytic assumptions and 
substantive hypotheses for statistical significance* Researchers 
frequently try to obtain as large a sample as possible,^ so that 
chances for "significance" of substantive tests are maxivnized* 
This practice also leads to greater likelihood that tests of 
homogeneity of variance or of regression will also be 
significant * 

The fallacious use of statistical control in inappropriate 
ANCOVA applications needs to be recognized by more researches, 
ab some researchers have long warned of these various dangers 
(Elashoff, 1969; Lord, I960)* ANCOVA is a special case of 
regression analysis* As Cliff (1987, p* 275) notes, "We could say 
that we are fitting a single regression equation to the data for 
all the groups and then doing an anova of the deviation from the 
regression line." 

Consider the hypothetical data presented in Table 5. The 
hypothetical study involves four children from a compensatory 
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program ("A") who have lower mean achievement (• . 19) on the 
cognitive pretest ("ZX") than do their peers (mean=.19) from the 
noncompensatory group. Furthermore, as one might expect, and as 
illustrated in Figure 1 (which also presents the cognitive 
pcsttest ("ZY") scores of the eight children), the children in 
the two groups are learning at different rates. 

INSERT TABLE 5 AND FIGURE 1 ABOUT HERE. 

Nevertheless, the ANCOVA procedures employ? the single beta 
weight (r_ = beta weight for two variable case = .81) derived by 
ignoring the group membership ("A" or "B") of the children, i.e., 
derived by ignoring the fact that the children are learning at 
different rates. This beta weight adjustment is presented in 
Figure 1 as the regression line for the variables, derived 
Ignoring group membership. However, Figure i also indicates that 
the slopes of regression lines computed separately for the two 
groups are different, and that it is not reasonable to use the 
same adjustment for both groups. 

Table 6 presents conventional ANOVA results for this data 
set when no covariance adjustments are implemented. Table 7 
presents an ANCOVA utilizing pretest scores ("ZX") as a 
covariate. Table 8 presents an ANOVA performed on the residual 
raw scores ("YE" = "ZY" - "YHAT"); this analysis demonstrates 
that ANCOVA is an ANOVA on posttest scores once the posttest 
scores have been residualized with the covariate ("YE") in a 
regression analysis completely ignoring group membership 
information. 
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INSERT TABLES 6 THROUGH 8 ABOUT r.iSRE. 



What many researchers do not understand is how ANCOVA can 
make the experimental intervention appear less effective ♦ Figure 
2 represents a case in which the covariate ("X") is associated 
with the dependent variable ("Y")y but not with the assignment to 
experimental conditions ("A")* In other words^ the homogeneity of 
regression assumption is met* 

Table 9 presents a one-way ANOVA corresponding to the Figure 
2 Venn diagram* Table 10 presents the related ANCOVA. In this 
example all the adjustment involving the covariate involves 
variance in the dependent variable not associated with assignment 
to experimental conditions. Therefore^ the sum of squares for the 
main effect remains unchanged^ but the covariate does reduce the 
sum of squares for error. This results in a smaller .mean square 
error^ and thereby a larger calculated F^ for the main effect. 

INSERT FIGURE 2 AND TABLES 9 AND 10 ABOUT HERE. 

But Figure 3 presents a case where the homogeneity of 
regression assumption is not met. Tables 11 and 12 present the 
related ANOVA and ANCOVA results^ respectively. Although the 
intervention does has some effect^ the application of the 
covariate in this "worst case" example makes the intervention 
appear entirely ineffective. Clearly^ covariance adjustments can 
have effects that some researchers do not recognize. 

INSERT FIGURE 3 AND TABLES 11 AND 12 ABOUT HERE. 

The fact that ANCOVA is simply ANOVA on the r^asidual raw 
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scores may also be disturbing from an interpretation point of 
view. The researcher took a variable that presumably had some 
meaning ("ZY"), made an adjustment on it, and was left with an 
analysis of a residual raw score that, unlike the original 
dependent variable, has little intrinsic meaning. The result 
might be difficult to interpret even if the adjustment was 
reasonable, i.e., if the homogeneity of regression assumption had 
been met. 

Too many researchers blindly apply ANCOVA absent an 
understanding or either the method's logic or its pivotal 
assumptions. As McGuigan (1983, p. 231) has observed, ANCOVA 
can be seriously misused, and one cannot be assured 
that it can "save" a shoddy experiment. Some 
researchers overuse this method as in the instance of 
a person I once overheard asking of a researcher, 
"Where is your an: lysis of covar iance?''~the 
understanding in his department was that it is always 
used in experimentation. 
Of course, the preceeding disc:assion of the ANCOVA case 
generalizes to the various types of statistical control that are 
available to researchers. 

ANCOVA is not robust to the violation of the homogeneity of 
regression assumption, but dissertation students routinely 
decline to evaluate this assumption. For example, ScheffeS (2001, 
p. 109-110) did not test the assumption, but argued that "The 
ANCOVA is a more powerful statistic that ANOVA since it is more 
likely to detect true differences between groups (Huck, Cormier, 
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& Bounds^ 1974)" (p. 107) • Pearson# (2001^ p* 93) suggested that^ 
'•An ANCOVA adds power to this analysis by controlling the within 
grodp variability related to teacher with administrator 
interactions^" though it is not clear whether the assumption was 
tested in the ANCOVA application. Meehl# (2001^ p. 52) similarly 
argued that ANCOVA is very useful but provided no test of 
homogeneity of regression. 

Anastasi# (2001) provides a particularly noteworthy 
application of ANCOVA. AnastasiS employed one intact class of 30 
fourth-grade students as the experimental group^ and one intact 
class of 35 fourth-grade students as a control group. Pretest 
achievement data (AnastasiS^ 2001^ p. 79) indicated that the two 
groups differed by a stan'iardized effect size of roughly 2.0 
standard deviations — a huge difference! Fourteen of the 30 
experimental group subjects (47%) had been retained in grade at 
least once (three students repeated grades twice); no control 
group subjects had been retained. 

Anastasi# (2001^ p. 78) explains the reason the groups were 
so systematically different: the students were homogeneously 
assigned to classes to guarantee (successfully) that the classes 
would be different. AnastasiS (2001^ p. 78) also explains that^ 
"test scores on all subjects were not available prior to the 
beginning of the study^ so the extent of the differences between 
the experimental and control subjects was not known." 

Anastasi# (2001^ p. 101) decided that^ "Since a 
statistically significant difference existed between the reading 
and language achievement levels of the experimental and control 
groups y an analysis of covariance was computed to equate the 

ERIC 30 



groups." Actua]ly^ Anastasi# (2001^ pp. 124-136) reports a series 
of ANCOVAs^ but no tests of the homogeneity of regression 
assumption. 

It is particularly intriguing that Anastasi» (2001^ 124-136) 
used four covariates^ rather than one. As Cliff (1987^ p. 278) 
explains^ "since this is really a form of regression, inferences 
become slippier as the variables [covariates] increase" in 
number. Furthermore, a "post hoc" analysis employing a t^-test of 
uncorrected dependent variable scores is somehow employed to 
explore ANCOVA results associated with the corrected dependent 
variable scores (Anastasitt, 2001, pp. 136-137). 

5. Dissertations should reflect the recognition that stepwise 
analytic methods can lead to seriously distorted 
interpretations . 

Stepwise analytic methods may be among the most popular 
research practices employed in both substantive and validity 
research. As commonly employed, these methods allow the entry of 
predictor variables one step at a time, and at each step the 
removal of previously entered variables is also considered. The 
methods seem to be somewhat casually employed especially in 
regression and discriminant analysis research, though variants 
are also available when other techniques are used (cf. Thompson, 
1984, pp. 47-51). 

With respect to regression applications, Marascuilo and 
Serlin (1988, p. 671) note that, "The most popular method in use 
for selecting the fewest number of predictor variables necessary 
to guarantee adequate prediction is based on a model referred to 
as stepwise regression ." Huberty (in press) concurs, suggesting 
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that "The conduct of analytical procedures in 'steps* is quite 
coinmon*.* [These] procedures have enjoyed widespread use by 
soQial and behavioral researchers." Unfortunately, stepwise 
methods can lead to serious misinterpretations of results, and 
"social science research is replete with misinterpretations of 
this kind" (Pedhazur, 1982, p. 168). 

Three problems with stepwise methods merit special emphasis. 
First, most researchers, thanks to "canned" computer programs, do 
not employ the correct degrees of freedom when evaluating changes 
in explained variance, i.e., usually changes in squared R or 
lambda. For example, in a stepwise regression analysis, the 
researcher at step two may add a second predictor variable into a 
prediction equation. The researcher might test the significance 
of the change in squared R by an P test using 1 and n~a-l degrees 
of freedom, where a is the number o£ predictor variables in the 
last step. The numerator degrees of freedom reflects a premise 
that only one additional predictor variable was employed to yield 
the squared R change, but ignores the fact that the added 
predictor was selected by consulting empirical sample results 
involving a larger set of candidates for entry into the 
prediction process. Thus, the process ignores that fact that, "in 
a sense, all the variables are in the equation, even though some 
of them have [effectively] been given zero weights" (Cliff, 1987, 
p. 187). Consequently, Cliff (1987, p. 185) suggests that "most 
computer programs for [stepwise] multiple regression are 
positively satanic in their temptation toward Type I errors." 

Second, some researchers incorrectly interpret stepwise 
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results in which a predictor variables have been selected as 
indicating that the predictor variables are the best variables to 
use if the predictor variable set is limited to size g.. In fact^ 
in a stepwise analysis in which three steps are conducted^ and 
predictors and C are employed^ it is entirely possible that 

three different predictors would represent the optimal predictor 
set of size three. Stepwise methods select the next-best 
predictor at each step^ given the presence of previous 
predictors — this is not the same as selecting the optimal 
predictor variable set of size a- An Huberty (in press) notes^ 
"It is generally understood hy methodologists that the first a 
variables entered into either a regression analysis or a 
discriminant analysis do not necessarily constitute the ''best' 
subset of size a*" 

Thirds some researchers incorrectly consult order of entry 
information to evaluate the importance of various predictor 
variables. As Huberty (in press) explains^ 

The first variable entered with a stepwise 
regression analysis is determined by the 
correlation between each predictor variable and 
the criterion variable... The thirds say^ variable 
to be entered (and often considered to be the 
third most important) is dependent on the two 
variables already entered. If one or two the 
variables already entered would be changed^ then 
the third variable entered may also be different. 
This dependence or conditlonality truly makes 
var-able importance as determined by stepwise 
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analyses very questionable. 
The small data set for a population (N=12) presented in 
Table 13 can be employed to illustrate how sampling error can 
seriously distort the interpretation of stepwise results involved 
in predicting dependent variable ZY. Table 14 indicates that the 
three predictor variables share little variance with each other 
and that the order o"f predictor variable explanatory power is, 
respectively, ZXl. ZX2^ ZX3, and ZX4 . 

INSERT TABLES 13 AND 14 ABOUT HERE. 

Presume that the researcher draws a random sample of nine 
subjects from the population of 12 persons. Each of 55 random 
collections of nine subjects (omit subjects 1,2,3; omit 1,2,4; 
etc.) is equally probable. For these illustrative data, only 
eight samples (omit 1,2,5; 1,2,7; 2,3,7; 2,3,10; 3,4,5; 5,6,8; 
7,8,9; and 7,8,12) enter the four predictor variables in the 
order that is known to be correct when the true population 
parameters are consulted. 

Indeed, only 23 samples select predictor ZXl as the first 
prediction entry. Sixteen samples select ZX2 as the first entry; 
10 samples select ZX3 as the first variable entered; six samples 
select the worst predictor, ZX4 . as the first or best predictor 
of ZY* For the sample omitting subjects 3, 4 and 9, the predictor 
variables are entered in the order: ZX4, ZX2., ZX3, and ZXl . 

Clearly, sampling error can seriously distort stepwise 
results. As Kachigan (1986, p. 265) argues, 

there is the danger that we might select variables 
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^ for inclusion in the regression equation based on 

chance relationships. Therefore^ as stressed in 
: our discussion of multiple correlation^ we should 

apply our chosen regression equation to a fresh 
[ ] sample of objects to see how well it does in fact 

predict values on the criterion variable. This 
I validation procedure is absolutely ej^sential if we 

\ are to have any faith at all in the future 

applications of the regression equation. 
Alternatively^ the researcher might employ a cross-validation 
procedure such as the one recommended by Huck^ Cormier and Bounds 
(1974^ p. 159), 

Given these considerations^ Kerlinger (1986^ p. 545) argues 
that "the research problem and the theory behind the problem [and 
not stepwise methods] should determine the order of entry of 
variables in multiple regression analysis." Researchers who 
choose to employ stepwise methods^ particularly if they also fail 
to use replication or cross-validation methods^ might best 
consider Cliff's (1987^ pp. 120-121) argument that "a large 
proportion of the published results using this method probably 
present conclusions that are not supported by the data." 

Dissertation students are not always aware of these 
subtleties. For example^ Pearson* (2001^ p. 92) reports a 
stepwise regression analysis. But Wilksff (2001^ pp. 122-127) 
p:.esents a more noteworthy application involving six steps of 
analysis. WilksS somehow interprets the stepwise multiple 
regression results (p. 126 ) in comparison with a bivariate 
correlation matrix (p. 124) not involving the same subjects. But 
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the comparison is instructive in indicating how stepwise results 
can lead to interpretation errors. Wilks# (2001, p. 127) reports 
that, "The results of the stepwise regression indicate that 
computer experience Ir=-0.288] and mathematics anxiety Ir,=-0.141] 
contribute significantly to the variance on computer anxiety" 
dependent variable* The importance of these two predictors was 
emphasized. Yet a third predictor variable, number of previous 
math classes, apparently had a larger r, 1-0.175] with the 
dependent variable than did math anxiety. 

6. Dissertations should reflect a recognition that 
instrumentation must have psychometric integrity if studies 
are to yield meaningful results . 

It is axiomatic that measurement integrity is vital in 

quantitative research. As Kerlinger (1986) explains with respect 

to reliability, for example. 

Since unreliable measurement is measurement 

overloaded with error, the determination of 

relations becomes a difficult and tenuous 

business . Is an obtained coefficient of 

determination between two variables low because 

one or both measures are unreliable? Is an 

analys is of variance F ratio not s ignif icant 

because the hypothesized relation does not exist 

or because the measure of the dependent variable 

is unreliable? .*.High reliability is no guarantee 

of good scientific results, but there can be no 

good scieatific results without reliability. (p. 

415) 
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Too few authors of published research report measurement 
statistics regarding their instrumentation^ as Willson (1980^ p. 
9) notes. So it not be too surprising that some dissertation 
students do not apparently consider these requirements. 

For example^ Cronbach# (2002^ p. 66) provides the following 
description of the sole instrument Employed in the dissertation: 
The survey instrument was designed by the author 
for this study. It consisted of two parts. The 
first part contained four demographic questions 
about the respondent and his/her institution. The 
second part contained 20 statements about the 
C onsent Decree to which each participant would 
respond on a 5-point Likert scale. All responses 
were made on an answer sheet suitable for optical 
scanning to inaxi:Tiize accurate evaluation of the 
data . 

No information regarding validity or reliability is presented. 

Similarly^ Cohen# (2001^ pp. 129-133) developed an 
instrument that presumed faculty subjects would be aware of 
practices of other faculty. The full description of the 
investigation of this measure's psychometric integrity was rather 
terse: ''Face and content validation of this instrument was 
obtained by a review of the literature in occupational therapy 
and jury review" (Cohen#^ 2001^ p. 52). 

Cohentt (2001^ p. 56) also developed a new genre of 
hypothesis substance and testing logic: 



Sub-hypo thes is 1.2: 



There are no discernible 
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attitudes tsic] of occupational therapy faculty as 
they relate to the computer as a threat to 
society* Category two of the ATCQ-OT, Computer 
Threat to Society provided a basis for 
investigating this sub-hypothesis • The resulting 
mean response of 10*880 was within ten percent of 
the middle rating of 12 [the scale midpoint] on 
this factor, as. measured by four items* Such an 
insignificant variance does not permit rejection 
of this null sub-hypothesis* 

7. Dissertations should reflect a^ recognition that it is not 
desirable to present statistically nonsensical or impossible 
results * 

Dissertations are more than a test of the student's ability 
to conduct original and independent research* Dissertations make 
critical contributions to the scholarly literature, for few 
contributions can reasonably be expected to involve as much 
thought and work or the pooled talent of as many scholars as are 
theoretically represented by the combination of the student and 
the members of the dissertation committee* Thus, it is not 
desirable to report nonsensical or impossible quantitative 
results that call into question the integrity of the remainder of 
a project as well* 

But dissertations do occasionally report just such results. 
For example, Scheffeff (2001, p. 109) reported that. 

Since the smallest cell sizci in the initial ANCOVA 
was 85, 85 subjects were randomly selected for 
each cell* The results of the ANCOVA indicated a 
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significant difference among the adjusted mean 
scores of the four groups of subjects on desired 
knowledge about computers tP ( 4, 335 ) = 2*54, 
E.<*05] • 

The covariate effect size is not reported (p. 110 )• The 
researcher pooled the sum of squares for the main effect together 
with the sum of squares for the covariate, thus completely 
confounding interpretations of both effects . Furthermore, the 
confounded result appears to be interpreted as being solely due 
to the main effect. 

But a more disturbing example is provided by CronbachS 
(2001, pp. 102-104), who reported a factor analysis in which 15 
factors were extracted, each ostensibly involving a different 
number of iterations. Regardless of whether iterations were 
employed in estimating communalities or in rotation, the number 
of iterations is one value for the entire solution. Thus, the 
result does not appear to be plausible. 

Summary 

Dissertations are the cumulative, tangible "best evidence'' 
cf interests of doctoral faculty and students in serious and 
incisive scholarship. Thus, dissertations are thoroughly studied 
by the program review teams periodically hired by boards of 
higher education in most states. The present paper explored seven 
errors in quantitative analysis in both published literature and 
dissertations. The errors are explained in detail using 
references to works by other authors and small hypothetical data 
sets to illustrate problems. Concrete examples of the errors as 
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they occur in dissertations are cited to make clear that the 
errors are not hypothetical. Ten dissertations, completed since 
January of 1985, are cited as examples, although pseudonyms are 
employed to avoid embarrasment of the students or their doctoral 
advisors. The discussion should also be useful to authors of 
published research who may wish to avoid the same errors. 
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Table 1 

Statistical Significance at Various Sample Sizes 
for a Fixed Effect Size (Moderate Effect Size) 
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Table 2 

Statistical Significance at Various Sample Sizes 
for a Fixed Effect Size (Larger Effect Size) 
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Table 3 

"Testwise" and "Exper imentwise" Error Rates for Selected Studies 



"Testwise" 

Rate 

05.0% 

05.0% 

05 .0% 



"Exper imentwise" Rate 
Minimum n of Tests Maximum 

05.0% 1 - ( - 05.0%) ** 1 = 

05.0% 1 - ( 95.0%) ** 1 = 

05.0% 1 - 95.0% = 05.00% 



05.0% 
05.0% 
05.0% 



05.0% 
05.0% 
05.0% 



1 - ( - 
1 - f - 
1 - ( - 



05.0%) 
05.0%) 
05.0%) 



** 
** 
** 



5 
10 
20 



= 22.62% 
= 40.13% 
= 64.15% 



Note. An alpha of 0.05 equals an alpha of 05.0%. "**" means 

"raised to the power of". The first several rows of the table 

illustrate the that "testwise" and "exper imentwise" error rates 
are the same when only one test is conducted. 
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Table 4 

Hypothetical Validity Study Data 



n T\ 


T n 


U V 


ux 










± 
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X u 
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"•X 
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2. 




X u 


X 


"•X 


~X 


~X 


~X 
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20 
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-1 


-1 


-1 


4 


7 


10 


0 


0 


3 


-1 


-1 




8 


20 


0 


0 


3 


-1 


-1 


5 


9 


10 


0 


0 


0 


4 


-1 




10 


20 


0 


0 


0 


4 


-1 


6 


11 


25 


0 


0 


0 


0 


5 




12 


35 


0 


0 


0 


0 


5 



Table 5 

Hypothetical ANCOVA Data Set 



Group 


ZY 


ZX 


ZYZX 


YHAT 


YE 


A 


-.88 


-1.68 


1.48 


-1.36 


.48 


A 


-.44 


-.68 


.30 


-.56 


.11 


A 


.00 


.31 


.00 


.25 


-.25 


A 


.44 


1.30 


.57 


1.06 


-.62 


B 


-1.32 


-.68 


.90 


-.56 


-.77 


B 


-.44 


-.19 


. 08 


-.15 


-.29 


B 


.88 


.56 


.49 


.45 


.43 


B 


1.76 


1.06 


1.86 


.86 


.91 



Note.. The beta weight for the covariance procedure (.813) equals 
the sum of the cross products (ZXZY) of ZX and ZY divided by n-1 
(5.694/n-l). The predicted posttest score (YHAT) is each child's 
pretest (ZX) multiplied by the beta weight. The error in each 
prediction (YE) is equal to ZY minus YHAT. 



Table 6 
Conventional ANOVA Results 



Sum of Mean Effect 



Source 


Squares 


df. 


Squares 


7 Size 


Treatment 


.39 


1 


.39 


.35 .056 


"Error" 


6.61 


5 


1.10 




Total 


7.00 


7 


1.00 





Note . Effect size is a r squared analog. 
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Table 7 
ANCOVA Results 






■ 




Source 


Sum of Mean 
Squares d£. Squares 


£ 


Effect 
Size 






Covar iate 
Treatment 
"Error" 
Total 


4.63 1 4.63 
.04 1 .04 
2.33 5 .47 
7.00 7 1.00 


9.95 
.08 


.661 
.006 






Note . Effect size is a r squared 


analog 


• 






Table 8 

ANOVA Results Using YE as Dependent Variable 






Source 


Sum of Mean 
Squares df. Squares 




Effect 
S i ze 






Treatment 

"Error" 

Total 


.04 1 .04 
2.33 5 .47 
2.37 6 


.08 


.006 






ANOVA 


Table 9 
Associated with Figure 2 








Source 


Sum of Mean 
Squares df Squares 


F 

Calc 


F 

Crit 






Treatment 

"Error" 

Total 


35 1 35.00 
65 9 7.22 
100 10 


4.85 


5.12 






Table 10 
ANCOVA Associated with Figure 2 








Source 


!:am of Mean 
Squares df Squares 


F 

Calc 


F 

Crit 




• 


Covar iate 
Treatment 
"Error" 
Total • 


20 1 20.0 
35 1 35.00 
■15 8 5.62 
100 10 


6.22 


5.32 






ANOVA 


Table 11 
Associated with Figure 3 








Source 


Sum of Mean 
Squares df Squares 


F 

Calc 


F 

Crit 






Treatment 

"Error" 

Votal 


20 1 20.00 
80 9 8.89 
100 10 


2.25 


5.12 
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Table 12 
ANCOVA Associated with Figure 3 







Sum of 


Mean 


F 


F 




Source 


Squares 


df Squares Calc 






Covar iate 


30 


1 30.0 






Treatment 


0 


1 


00 .00 


5.32 




"Error" 


70 


8 8. 


75 






Total 


100 


10 










Table 13 








Standardized Data 


for Five 


vct^ xctuxco 




ID 


ZY 


ZXl 


ZX2 




'7YA 


1 


.790 


1.422 


.350 


.322 


• J X 0 


2 


-1.589 


.112 


-1.239 


-1.094 


• ODD 


3 


.127 


-.:65 


.271 


.201 


— n c n 

• U D u 


4 


-1.656 


-2.167 


-.498 


-.970 




5 


.176 


-1.291 


.153 


2.393 




6 


-.017 


.636 


-1.607 


-.168 


-1.746 


7 


-.397 


-.173 


.931 


-.112 


-1.704 


8 


-.594 


.532 


-.108 


.092 


.127 


9 


.846 


.528 


1.237 


-.092 


.035 


10 


.810 


.642 


-1.400 


1.135 


1.654 


11 


1.764 


.373 


1.290 


-.543 


1.005 


12 


-.260 


.352 


.620 


-1.163 


.989 



Table 14 
Bivariate Correlation Matrix 





ZY 


ZXl 


ZX2 


ZX3 


ZX4 


ZY 




.497 


- .444 


.384 


.319 


ZXl 


24.7% 




.018 


-.074 


-.004 


ZX2 


19 .7% 


.0% 




-.054 


.099 


ZX3 


14.7% 


.5% 


.3% 




.103 


ZX4 


10.2% 


.0% 


1.0% 


1.1% 





Note. Bivariate r. coefficients are presented above the diagonal. 

Common variance (squared r.) percentages are presented 
below the diagonal. 
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Figure 1 
Scattergram of ,ANCOVA Data 




Figure 2 
ANCOVA Best Case 




Y 



Figure 3 
ANCOVA Worst Case 
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